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PEPTIDE AND PROTEIN FUSIONS TO THIORBDOXIN 
AND THIOREDOXIN--LIKE HOLECULES 

The inveniiloR relates generally to the production of fusion 
proteins in prokaryotic and eukaryotic cells. More specifically, 
the invention relates to the expression in host cells of 
recombinant fusion sequences comprising thioredoxin or 
thioredoxin-like sequences fused to sequences for selepted 
heterologous peptides or proteins, and the use of such fusion 
molecules to increase the production, activity, steUdility or 
solubility of recombinant proteins and peptides. 

Background of the Invention 

Many peptides smd proteins can be produced via recombinant 
means in a veuriety of expression systems, e.g., various strains 
of bacterial, fungal, mammalian or insect cells. However, when 
bacteria are used as host cells for heterologous gene expression, 
several problems frequently occur. 

For example, heterologous genes encoding small peptides are 
often poorly expressed in bacteria. Because of their size, most 
small peptides are tinable to adopt stable, soluble conformations 
and are siibject to intracelluleur degradation by proteases and 
peptidases t>resent in the host cell» Those ismall peptides which 
do manage to accumulate when directly expressed in E. coli or 
other bacterial hosts are usually found in the insoltable or 
"inclusion body" fraction, an occurrence which renders them 
almost useless for screening purposes in biological or 
biochemical assays. 

Moreover, even if small peptides. are not produced in 
ijiolusion bodies,, the production of small peptides by recombinant 
means as candidates for new drugs or enzyme inhibitors encounters 
furthier problems. Even small linear peptides can adopt an 
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enormous nuaiber of potaitial structures dua to their degrees of 
conformational freedom. Thus a saiall peptide can; have the 
^desired' anino-acid sequence and yet have very low activity in 
an assay because the » active* peptide conformation is only one of 
5 the many alternative structures adopted in free solution. OSiis 
presents another difficulty enoount«»d in producing small 
heterologous perptides reoombinantly for effective research and 

therapeutic use. 

Xnelusion body formation is also frequently observed when 

lO the genes for heterologous proteins are expressed in bacterial 
- cells, oaiese inclusion bodies usually require further 

manipulations in order to solidsilize and refold the heterologous 
protein, with conditions determined eotpiricaXly and with 
tuxcertainl^ in each case. 

IS If these additional procedures are not successful, little to 

no protein retaining bioactivity can be recovered from the host 
cells. Moreover, these additional processes are often 
teehnloaliy difficult anA prohibitively expensive for practiqal 
production of recoad>inant proteins for therapeutic, diagnostic or 

20 other research uses. 

a?o overcome these problems, the art has employed certain 
peptides or proteins as fusion "partners" with a desired 
heterologoiis peptide or protein to liable the reoociblnant 
eaepression and/or secretion of small peptides or larger proteins 

25 as fusion proteins in bacterlca expression systems. Among exush. 
fusion partners are included lacZ and trpE fusion proteins, 
maltose-binding protein fusions, and glutathione-S-transferase 
fusion proteins [See, generally. Current Protocols in Molecular 
Biology, Vol. 2, suppl. 10, .publ. John Klley and Sons, Mew York, 

30 NY, pp. 16.4.1-16.8.1 t and Smith et al, Sgns_fi2»31-40 

(1988)}. U. S. Patent 4,801,536 describes the fusion of a 
bacterial flageilln protein to a desired protein to enable the 
production of a heterologous gene in a bacterial cell and its 
seer tlon into the culture medium as a fusion protein. PCT 

35 Patent Publication W091/11454 disclos s fusion proteins using 
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biotlnylated renin as the fusion partner. The renin is 
immobilized on a pxirif ication column to facilitate separation and 
cleavage. 

However, often fusions of desired peptides or proteins to 
5 other proteins (i.e., as fusion peurtners) at the amino- or 
carboxyl-* termini of these fusion partner proteins have other 
potential disadvantages. Experience in E, coll has shown that a 
crucial factor in obtaining high levels of gene expression is the 
efficiency of translational initiation. Translational initiation 

10 in E, coli is very sensitive to the nucleotide sequence 
. surrounding the initiating methionine codon of the desired 
heterologous peptide or protein secpience, although the rules 
governing this phenomenon are not clear. For this reason, 
fusions of sequences at the amino-terminus of many fusion partner 

15 proteins affects expression levels in an unpredicted^le memner. 
Xn addition there are ntamerous amino- and carboxy-peptidases in 
E. coli whi^ degrade amino- or earboxyl-terminal peptide 
extensions to fusion partner proteins so that a number of the 
Icnown fusion partners have a low success rate for producing 

20 stable fusion proteins. 

The piirif ication of proteins produced by recombinant 
expression systems is often a serious challenge. There is a 
continuing requirement for new and easier methods to produce 
homogeneous preparations of recombinant proteins, and yet a 

25 number of the fusion partners currently used in the art possess 
no inherent properties that would facilitate the purification 
process. Therefore, in the art of recomblneuit expression 
systems, there remains a need for new compositions and processes 
for the production and purification of staOtle, soluble peptides 

30 and proteins for use in research, diagnostic and therapeutic 
applications. 



35 



Sinmnarv of the Invention 

In one aspect, the invention provides a fusion sequence 
comprising a thior doxin-like protein sequence fused to a 
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selected heterologous peptide or protein. The peptide or protein 
may be fused to the aioino terminus o£ the thioredoxin-lilce 
sequence, the carboxyl terminus of the thioredoxin-like sequence, 
or within the thioredoxin-like sequence (e.g., within the active- 
5 site loop of thioredoxin) • The fusion sequence according to this 
invention may optionally contain a linker peptide between the 
thioredoxin-like sequence and the selected peptide or protein. 
This linker provides, where needed, a selected cleavage site or a 
stretch of amino adds capable of preventing steric hindrance 
10 between the thioredoxin-like molecule and the selected peptide or 
protein. 

As another aspect, the invention provides a DNA molecule 
encoding the fusion sequence defined above in association with, 
and under the control of, an eacpression control sequence capable 
15 of directing the eaqpression of the fusion protein in a desired 
host cell. 

Still a further aspect of the invention is a host cell 
transformed with, or having integrated into its genome, a DNA 
sequence comprising a thioredoxin-like DKA sequence fused to the 

20 DMA sequence of a selected heterologous peptide or protein. This 
fusion sequence is desirably under the control of aui expression 
control sequence capeOale of directing the expression of a fusion 
protein in the cell. 

As yet another aspect, there is provided a novel method for 

25 increasing the expression of soluble recombinemt proteins. The 
method includes oulturing under suitable conditions the above- 
described host cell to produce the fusion protein. 

In one embodiment of this method, if the resulting fusion 
protein is cytoplasmic, the cell can be lysed by conventional 

30 means to obtain the soluble fusion protein. More preferably in 
the case of ^toplasmic fusion proteins, the method includes 
releasing the fusion protein from the host ceil by applying 
osmotic shock or freeze/thaw treatments to the cell. In this 
case th fusion protein is sel ctively released from th interior 

35 f th cell via the z nes of adhesion that exist b tween th 
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Inner and outer neabranes of E. coll > The fusion protein Is then 
purified by conventional means. 

In another embodiment of the method. If a secretory leader 
Is employed In the ftislon protein construct, the fusion protein 
5 can be recovered from a perlplasmlo extract or from the cell 
culture medium. 

An additional step In both of these methods Is cleavage of 
the desired protein from the thloredoxln«-llke protein by 
conventional mesms. 
10 Other aspects and advantages of the present Invention will 

be apparent upon consideration of the following detailed 
description of preferred embodiments thereof. 

Suimnarv of fhe Drawings 
15 Fig. 1 llliistrates the DNA sequence of the expression 

plasmld pA3JTRXA/£K/XULl^ Pro-581 and the amino acid sequence for 

the ftision protein therein, described In Example 1. 

Fig. 2 Illustrates the DNA sequcmoe and amino add sequence 

of the macrophage Inhibitory proteln-3a (HlP-Oa ) protein used In 
20 the construction of a thloredoxln fusion protein described In 

Example 3. 

Fig. 3 Illustrates the DNA sequence and amino acid sequence 
of the bone moinpbogenetlc proteln--2 (BHP-2) protein used In the 
construction of a thloredoxln fxislon protein described In Example 
25 4. 

Fig. 4 Is a schematic drawing Illustrating the Insertion of 
an enteroklnase cleavage site Into the active-site loop of E. 
coll thloredoxln (trxA) described In Exaaple 5. 

Fig. 5 Is a schematic drawing Illustrating random peptide 
30 Insertions Into the active-site loop of E. coll thloredoxln 
(trxA) described In Example 5. 

Fig. 6 Illustrates the DNA sequence and amino acid sequence 
of the htimem lnterleukln-6 (ZL-e) protein used In the 
construction of a thloredoxln fusl n pr teln described In Example 
35 6. 
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Pig. 7 lllustxates the DNA sequence and amino acid sequence 
of the M-CSF protein used in the construction of a thioredoxin 
fusion protein described in Example 7. 

5 Detailed Description of t he Invention 

This invention permits the production of large amounts of 
heterologous peptides or pTOteins in a stable, soluble form in 
certain host cells that normally eaqaress limited amounts of such 
peptides or proteins* It enables release of the fusion protein 

10 from the production cells without the necessity of lysing the 
cells, thereby streamlining the purification process. Also, by 
using a small peptide insert in an internal region of the 
thioredoxin-like sequence (e.g. the active site loop of 
thioredoxin) the invention provides a ready cleavage site, 

15 accessible on the surface of the molecule. The fusion proteins 
of this invention also permit the desired peptide or protein to 
achieve its desired conformation. 

According to the present invention, the DNA sequence 
encoding a heterologous peptide or protein selected for 

20 ea^ression in a recombinant system is fused to a thioredoxin-lilce 
DNA sequence for expression in the host cell. A thioredoxin-like 
DNA sequence is defined herein as a DNA sequence encoding a 
protein or fragment of a protein characterized by an amino acid 
sequence having at least 18% homology with the amino acid 

25 sequence of E. coli thioredoxin over an amino acid sequence 
length of 80 amino acids. Alternatively, a thioredoxin DNA 
sequence is defined as a DNA sequence encoding a protein or 
fragment of a protein characterized by a crystalline structure 
substantially similar to that of human or E* col A thioredoxin. 

30 The DNA sequence of glutaredoxin is one sudh sequence. The amino 
acid sequence of E. coli thioredoxin is described in H. Eklund et 
al, EMBO J. 3 ;1443«>1449 (1984)* The three-dimensional structure 
of E. colt thioredoxin is depicted in Fig. 2 of A. Holmgren, J^. 
Bi 1. Chem. 264 213963-13966 (1989). Fig. 1 below nucleotid s 

35 2242-2568 contains a DNA sequence encoding the ooXi, 
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thloredoxin protein [Llm et al, J> Bacteriol^ 163 ;311"316 
(1985)]. The three latter publications are incoirporated herein 
by reference for the ptirpose of providing information on 
thloredoxin which is known to one of skill in the art. 
5 As the primary example of a thioredoxin-like protein useful 

in this invention, E, coli thloredoxin has the following 
characteristics. E> coli thloredoxin is a small protein, only 
11.7 kD, emd cam be expressed to high levels (>10%, corresponding 
to a concentration of 15 \aM if cells are lysed at 10 Aq^q/ioI) • 

10 The small size and capacity for high expression of the protein 
contributes to a high intracellular concentration. E. coli 
thloredoxin is further characterized by a very stable, tight 
structure which can minimize the effects on overall structural 
stability caused by fusion to the desired peptide or proteins. 

15 The three dimensional structiire of E. coli thloredoxin is 

known. It contains several surface loops. Including a xuiic[ue 
active site loop between residues Cys33 and Cys^^ which protrudes 
from the body of the protein. This active site loop is an 
identifiaJdle, accessible surface loop region and is not involved 

20 in any interactions with the rest of the protein that contribute 
to overall structural stability. It is therefore a good 
candidate as a site for peptide Insertions. Both the amino- and 
carboxyl-temini of E. coli thloredoxin are on the surface of the 
protein, and are readily accessible for fusions. 

25 E. coli thloredoxin is also steJ^le to proteases. Thus, E. 

coli thloredoxin may be desirable for use in E. coli expression 
systems, because as an E. coli protein it is characterized by 
stability to E. coll proteases. E. coli thloredoxin is also 
stable to heat up to 80 *C and to low pH. Other thioredoxin*like 

30 proteins encoded by thioredoxln-like DNA seG[uenGes useful in this 
invention may share the homologous amino acid sequences, and 
similar physical and structural characteristics. Thus, DNA 
sequences encoding other thioredoxin*-like proteins may be used in 
place of E. coli thloredoxin according to this invention. Tor 

35 example, the DKA sequ nee encoding other species' thi redoxin, 
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e.g., human tnioredoxin, may be employed in the compositions and 
methods ot this invention. Both the primary sequence and 
con^ter-predicted secondary stinictures of human and coll . 
thioredoxins are very similar. Human thioredoxin also carries 
5 the same active site loop as is found in the CoJ,i protein. 

insertions into the human thioredoxin active site loop and on the 
amino and carboseyl termini may be as well tolerated as those in 
E. eoli thioredoxin. 

Other thioredoxin-liJce sequences which may be employed in 

10 this invention include all or portions of the proteins 

gluteuredoxin and various species' homologs thereof [A. Holmgren, 
* cited above] . Although e. coli glutaredoxin and ^. cpli 
thioredoxin share less than 20% amino acid homology, the two 
proteins do have conformational and functional similarities 

IS [Hdund et al, embo j. 3:1443-1449 (1984)]. 

All or a portion of the DNA sequence encoding protein 
disulfide isomerase (PDI) and various species' homologs thereof 
[J. E. Edman et al, Nature 317 :267-370 (1985)] may also be 
enployed as a thioredoxin-lilce DNA sequence, since a repeated 

20 domain of PDI shares >18% homology with B- CPll thioredoxin. The 
two latter publications are incorporated herein by reference for 
the purpose of providing information on glutaredoxin and PDI 
which is known and available to one of skill in the art. 

Similarly the DNA sequence encoding phosphoinositide- 

25 specific phospholipase C (PI-PLC) , fragments thereof and various 
species' homologs thereof [C. F, Bennett et al, Wat^re 334 :268- 
270 (1988)] may also be employed in the present invention as a 
thioredoxin-like sequence based on the amino acid sequence 
hcaaology with e. coli thioredoxin. All or a portion of the DNA 

30 sequence encoding an endoplasmic reticulum protein, sueSh as 

Erp72, or various species homologs thereof are also included as 
thioredoxin-like DNA sequences for the purposes of this invention 
[R. A. Mazaarella et al, -t, . ehe«. 265:1094-1101 (1990)] 

based on amino acid sequence homology. Anoth r thioredoxin-like 

35 sequenc is a DNA sequenc which encodes all or a portion of an 
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adult T-cell letikemia-derlved factor (ADF) or other species 
honologs thereof [N. Wakasugl et al, pyoc, y^tji, ^cq^, |Sc4,, USA 
fi2:8282-8286 (1990)] based on amino acid sequence homology to E, 
coli thioredoxin. The three latter publications are incorporated 
5 herein by reference for the purpose of providing information on 
PI*PIiC, Srp72, and ADF vhich are known euid available to one of 
skill in the art. 

It is expected from the definition of thioredoxin-like DKA 
sequence used above that other seqpiences not specifically 

10 identified above, or perhaps not yet identified or published, may 
be useful as thioredoxin-like sequences based on their amino acid 
sequence similarities to E, coli thloredoxin and characteristic 
crystalline stsnictural similarities to E, coli thioredoxin and 
the other thioredoxin-like proteins. Based on the above 

15 description, one of skill in the art should be able to select and 
identify, or, if desired, modify, a thioredoxin-like DNA sequence 
for use in this invention without resort to undue 
eaqperimentation. For example, siaqple point mutations made to 
portions of native thioredoxin or native thioredoxin-like 

20 sequences which do not effect the structure of the resulting 
molecule are alternative thioredoxin-like sequences, as are 
allelic variants of native thioredoxin or native thioredoxin<-like 
sequences. 

DNA sequences which hybridize to the sequence for E, coll 
25 thioredoxin or its structural homologs under either stringent or 
relaxed hybridization also encode thioredoxin-like proteins for 
use in this invention. Stringent hybridization is defined herein 
as hybridization at 4X6SC at 65 -C, followed by a washing in 
O.IXSSC at 65*c for an hour. Alternatively stringent 
30 hybridization is defined as hybridization in 50% formamide, 4XSSC 
at 42*C. Non-stringent hybridization is defined herein as 
hybridizing in at are 4XSSC at 50*C, or hybridization with 30-40% 
formamide at 42 'C. The use of all such thioredoxin-llke 
sequences are beli ved to be encompassed in this invention. 
35 c nstructi n f a fusi n s quenc f th present invention. 
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wlilc3h comprises the DMA sequence of a selected peptide or protein 
and the DNA seqaence of a thioredoacin-like sequence, employs 
conventional genetic engineering techniques [see, Sanbrook et al, 
Molecular cionincr, A Twiibnratorv HaimaX. , Cold Spring Harbor 
5 Laboratory, Cold Spring Heurbor, New York (1989) ] . Fusion 

sequences may be prepared in a number of different ways. For 
example, the selected heterologous protein may be fused to the 
amino terminus of the thioredoxin-like molecule. Altematiively, 
the selected protein sequence may be fused to the carbosQrl 

10 terminus of the thioredoxin-like molecule. Sxnall peptide 

sequences could also be fused to either of the above-mentioned 
positions of the thioredoxin-like sequence to produce them in a 
structurally unconstrained manner. 

•Chis fusion of a desired heterologous peptide or protein to 

15 the thioredoxin-like protein increases the stability of the 
peptide or protein. At either the amino or carboxyl terminus, 
the desired heterologous peptide' or protein is fused in such a 
manner that the fusion does not destedsilise the native structure 
of either protein. Additionally, fusion to the soluble 

20 thioredoxin-like protein Improves the solubility of the selected 
heterologous peptide or protein. 

It may be preferred for a variety of reasons that peptides 
be fused within the active site loop of the thioredoxin-like 
molecule. The face of thioredoxin surrounding the active site 

25 loop has evolved, in keeping with the protein's major function as 
a nonspecific protein distilfide oxido-reductase, to be able to 
interact with a wide variety of protein surfaces. The active 
site loop region is found between segments of strong secondary 
structure and offers mzmy advantages for peptide fusions. A 

30 smaai peptide inserted into the active-site loop of a 

thioredoxin-like protein is present in a region of the protein 
which is not involved in maintaining tertiary structure. 
Therefore the structure of such a fusion protein should be 
stable. Previous work has shown that e. coli thioredoxin can be 

35 cleaved into two fragments at a position cl s to th active site 
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loop, and yet the tertiary interactions stabilizing the protein 
remain. 

The active site loop of E> coll thioredoxin has the sequence 
NH2. • .Cys33-Gly-Pro-Cys35. . .COOH. Fusing a selected peptide with 
5 a thioredoxin<-like protein in the active loop portion of the 

protein constrains the peptide at both ends, reducing the degrees 
of conformational freedom of the peptide, and consequently 
reducing the nuaiber of alternative structures taken by the 
peptide. The inserted peptide is bound at each end by cysteine 

10 residues, which may form a disulfide linkage to each other as 
they do in native thioredoxin and further limit the 
conformational freedom of the inserted peptide. 

Moreover, this invention places the peptide on the surface 
of the thioredoxin^-like protein. Thus the invention provides a 

15 distinct advantage for use of the peptides in screening for 

bioactive peptide conformations and other assays by presenting 
peptides inserted in the active site loop in this structural 
context. 

Additionally the fusion of a peptide into the loop protects 

20 it from the actions of E. coll amino- and carboxyl-peptldases . 
Further a restriction endonuclease cleavage site RsrXI already 
exists in the portion of the E, coli thioredoxin DNA sequence 
encoding the loop region at precisely the correct position for a 
peptide fusion [see Figure 4]. RsrZI recognizes the DNA sequence 

25 CG6(A/T)C06 leaving a three nucleotide long 5 * -protruding sticky 
end. DlfA bearing the complementary sticlcy ends will therefore 
insert at this site in just one orientation. 

A fusion sequence of a thioredoxin-like secpience and a 
desired protein or peptide sequence according to this invention 

30 may optionally contain a linker peptide inserted between the 
thioredoxin-like sequence and the selected heterologous pepride 
or protein. This linker sequence may encode, if desired, a 
polypeptide which is 8electea:>ly deaveO^le or digestible by 
conventional chemical or enzymatic m thods. For example, the 

35 sel cted d avag site may b an nzymatic cl avage site. 
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Examples of enzymatic cleavage sites include sites for cleavage 
by a proteolytic enzyme, such as enter okinase. Factor Xa^ 
trypsin, collagenase, and thrombin. Alternatively, the cleavage 
site in the linker may be a site capable of being cleaved upon 
5 exposure to a selected chemicsa, e.g., cyanogen bromide, 
hydroxylamine, or low pH. 

Cleavage at the selected cleavage site enables separation of 
the heterologous protein or peptide f arom the thioredoxin fusion 
protein to yield the mature heterologous peptide or protein. The 

10 mature peptide or protein may then be obtained in purified form, 
* free from amy polypeptide fragment of the thioredoxin-like 
protein to which it was previously linked. The cleavage site, if 
inserted into a linker useful in the fusion sequences of this 
invention, does not limit this invention. Any desired cleavage 

15 site, of which many are known in the art, may be used for this 
purpose. 

The optional linker sequence of a fusion sequence of the 
present invention may serve a purpose other than the provision of 
a cleavage site. The linker may also be a simple amino acid 

20 sequence of a sufficient length to prevent any steric hindrance 
between the thioredoxin-like molecule and the selected 
heterologous peptide or protein. 

fttiether or not such a linker sequence is necessary will 
depend upon the structural Characteristics of the selected 

25 heterologous peptide or protein and whether or not the resulting 
fusion protein is useful without cleavage. For example, where 
the thioredoxin-like sequence is a human sequence, the fusion 
protein may itself be useful as a therapeutic without cleavage of 
the selected protein or peptide therefrom. Alternatively, where 

30 the mature protein sequence may be naturally cleaved, no linker 
may be needed. 

In one embodiment therefore, the fusion sequence of this 
invention contains a thioredoxin-like sequence fused directly at 
its amino or carboxyl terminal end to the s quenc f the 
35 selected p ptid r protein. The r suiting fusion prot in is 
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thus a soluble cytoplasmic fusion protein. In another 
embodiment, the fusion sequence further comprises a linker 
sequence interposed between the thioredoxin-like sequence and the 
selected peptide or protein sequence. This fusion protein is 
5 also produced as a soluble cytoplasmic protein. Similarly, where 
the selected peptide sequence is inserted into the active site 
loop region or elsewhere within the thioredoxin-like sequence, a 
cytoplasmic fusion protein is produced. 

The cytoplasmic fusion protein can be purified by. 

10 conventional means. Prefereibly, as a novel aspect of the present 
' invention, several thioredoxin fusion proteins of this invention 
may be purified by exploiting an unusual property of thioredoxin. 
The cytoplasm of E, coll is effectively isolated from the 
external medium by a cell envelope comprising two membranes, 

15 inner and outer, separated from each other by a periplasmic space 
within which lies a rigid peptidoglycan cell wall. The 
peptidoglycan wall contributes both shape and strength to the 
cell. At certain locations in the cell envelope there are "gaps'* 
(called variously Bayer patches, Bayer junctions or adhesion 

20 sites) in the peptidoglycan wall where the inner and outer 

membranes appear to meet «md perhaps fuse together. See, H. E. 
Bayer, J. BaetBgiol. QasiiOA^m^ (1967) and J, Gen. Microbiol, 
£3.: 395-404 (1968). Host of the cellular thioredoxin lies loosely 
associated with the inner surface of the membrsme at these 

25 adhesion sites and can be quantitatively expelled from the cell 
through. these adhesion sites by a sudden osmotic shock or by a 
simple freeze/thaw procedure. See C. A. Lunn and V. P. Pigiet, 
J. Biol, chem. 257 2 xiA2A«>iXA3n (1982) and in " Thioredoxin and 
GlUtaredoxin Svstemsg structure and Function; 165-176 (1986) ed. 

30 A. Holmgren et al.. Raven Press, New York. To a lesser extent 
some EF-Tu (elongation factor-Tu) can be expelled in the same way 
[Jacobson et al. Biochemistry 15 f 2297^2302^ (1976)], but, with the 
exception of the periplasmic contents, the vast majority of E. 
coli proteins cann t be r leas d by th s treatments. 

35 Although ther hav been reports f the r lease by osmotic 
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shocJc of a limited nxanber of heterologous proteins produced in 
the cytoplasm of e> eoli [Denefle et al, fignaJS: 499-510 (1989); 
Joseph-Iiiauzun et al, eene 86 g 291-295 (1990)1 Rosenwasser et al, 
J, Biol, Cham, 265 :13066-13073 (1990)], the ability to be so 
5 released is a rare and desirable property not shared by the 
majority of heterologous proteins. Fusion of a heterologous 
protein to thioredoxin as described by the present infvention not 
only enhances its e3cpre8sion, solubility and stability as 
described above, but may also provide for its release from the 

10 cell by osmotic shock or freeze/thaw treatments, greatly 

simplifying its purification. The thioredoxin portion of the 
fusion protein in some cases, e.g., with MIP, directs the fusion 
protein towards the adhesion sites, from where it can be released 
to the exterior by these treatments. 

15 In another embodimmit the present dLnvention may employ 

another coxnponent, that is, a secretory leader sequence, among 
• which many are toiown in the art, e.g. leader seqpxences of phoA, 
MBP, p -lactamase, operatively linked in frame to the fusion 
protein of this invention to enable the expression and secretion 

20 of the mature fusion protein into the bacterial periplasmic space 
or culture medium. This leader sec[uence may be fused to the 
amino terminus of the thioredoxin-like molecule when the selected 
peptide or protein sequence is fused to the caurboxyl terminus or 
to an internal site within the thioredoxin-like sequence. An 

25 optional linker could also be present when the peptide or protein 
is fused at the carbo3vl terminus. It is es^ected that this 
fusion sequence construct when expTessed in an appropriate host 
cell would be expressed as a secreted fusion protein rather than 
a cytoplasmic fusion protein. However stability, solubility and 

30 high expression should ^uuracterize fusion proteins produced 
using any of these alternative embodiments. 

This invention is not limited to any specific type of 
heterologous peptide or protein. A wide variety of heterologous 
genes r gen fragments are useful in f ojnaing th fxisi n 

35 sequences of the present inventi n. While the comp sitions and 
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methods of this invention are most useful for peptides or 
proteins \diioh are not es^ressed, expressed in inclusion bodies, 
or e3cpressed in very sioall amounts in bacterial and yeast hosts, 
the heterologous peptides or proteins can Include any peptide or 
protein useful for humeui or veterinary therapy, diagnostic or 
research applications In any expression system. For exeunple, 
hormones, cytokines, growth or ixihlbltory factors, enzymes, 
modified or iirtiolly synthetic proteins or peptides cem be produced 
according to this invention in bacterial, yeast, mammalian or 
other eukaryotlc cells and expression systems suitable therefor. 

In the examples below illustrating this invention, the 
proteins ea^ressed by this invention Include II^ll, MIP-3a , IL-6, 
H-CSF, a bone inductive factor called BHP-2, and a variety of 
small peptides of random secpiende. These proteins include 
examples of proteins which, when expressed without a thloredoxln 
fusion partner, ar^ unstable in E. coll or are found In inclusion 
bodies. 

A variety of DMA molecules incorporating the above-described 
fusion sequences may be constructed for expressing the 
heterologous peptide or protein according to this invention. At 
a minimum a desirable DKA sequence according to this invention 
comprises a fusion sequence described above, in association with, 
and under the control of, an escpresslon control sequence capable 
of directing the expression of the fusion protein in a desired 
host cell. For example, where the host cell is an fi. coll 
strain, .the DNA molecule desirably contains a promoter which 
functions in E. coll , a ribosome binding site, and optionally, a 
selectable marker gene and an origin of replication if the DNA 
molecule is extra chromosomal. Numerous bacterial expression 
vectors containing these components are known in the art for 
bacterial expression, and can easily be constructed by standard 
molecular biology techniques. Simileurly known yeast and 
mammallem cell vectors and vector components may be utilized 
where the h st c 11 is a yeast cell r a mammallem cell. 

The DNA molecules c ntaining the fusi n s quences may be 
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fxirther modified to contain different codons to optimize 
expression in the selected host cell, as is known in the art. 

These DNA molecules may additionally contain multiple copies 
of the thioredoxin-like DNA sequence, with the heterologous 
5 protein fused to oxay one of the DNA sequences, or with the 

heterologous protein fused to all copies of the thioredoxin-lilce 
sequence. It may also be possible to integrate a thioredoxin- 
like/heterologous peptide or protein-encoding fusion sequence 
into the chromosome of a selected host to either replace or 

10 duplicate a native thioredoxin-like sequence. 

Host cells suitable for the present invention are preferably 
bacteriaa cells. For example, the veurious strains of Et polX 
(e.g., HBlOl, W3110 and strains used in the following examples) 
are well-Icnown as host cells in the field of biotecdmology. Sjl. 

15 coli strain GX724, used in the following examples, has been 
deposited with a United States microorgaoiism depository as 
described in detail below. Various strains of* p. p\ftt4lis, 
Pseudomonas , and other bacteria may also be employed in this 
method. 

20 Many strains of yeast and other eukaryotic cells loiown to 

those skilled in the art may also be useful as host cells for 
expression of the polypeptides of the present invention. 
Simileirly known mammalian cells may also be employed in the 
expression of these fusion proteins. 

25 To produce the fusion protein of this invention, the host 

cell is -either transformed with, or hsus integrated into its 
genome, a DNA molecule comprising a thioredoxin-like DNA sequence 
fused to the DNA sequence of a selected heterologous peptide or 
protein, desirably under the control of an expression control 

30 sequence capable of directing the expression of .a fusion protein. 
The host cell is then cultured under known conditions suitable 
for fusion protein production. If the fusion protein accumulates 
in the pytoplsism of the cell it may be released by conventional 
bacterisa cell lysis t chniques and pturif i d by c nventi nal 

35 procedures including s lectiv precipitati ns, s lubilizations 
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and column chromatographic methods. If a secretory leader is 
Incorporated into the fusion molecule substantial purification is 
achieved when the fusion protein is secreted into the periplasmic 
space or the growth medium. 
5 Alternatively, for cytoplasmic thioredoxin fusion proteins, 

a selective release from the cell may be achieved by osmotic 
shock or freeze/thaw procedures. Although final purification is 
still required for most purposes, the initial purity of fusion 
proteins in preparations resulting from these procedures is 

10 superior to that obtained in conventional whole cell lysates, 

reducing the ntjmber of subsequent purification steps required to 
attain homogeneity. In a typical osmotic shock procedure, the 
packed cells containing the fusion protein are resuspended on ice 
in a buffer containing EDTA and having a high osmolarity, usually 

15 due to the inclusion of a solute, such as 20% w/v sucrose, in the 
buffer which cannot readily cross the cytoplasmic membrsme. 
During a brief incubation on ice the cells plasmolyze as water 
leaves the cyt^lasm down the osmotic gradient. The cells are 
then switched into a buffer of low osmolarity, and during the 

20 osmotic re-equilibration both the contents of the periplasm and 
proteins localized at the Bayer patches are released to the 
exterior. A simple centrifugation following this release removes 
the majority of bacterial cell-derived contaminants from the 
fusion protein preparation. Alternatively, in a freeze/thaw 

25 procedtire the packed cells containing the fusion protein are 
first resuspended in a buffer containing EDTA and are then 
frozen • Fusion protein release is subsequently achieved by 
allowing the frozen cell suspension to thaw. The majority of 
contaminants can be removed as described above by a 

30 centrifugation step. The fusion protein is fturther purified by 
well-lcnown conventional methods. 

These treatments typically release at least 30% of the 
fusion proteins without lysing the cell cultures. The success of 
these procedures in rel asing significant amounts f a wide 

35 veuriety of thi r doxin fusion proteins is surprising, since such 
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techniques are not generally successfiil witli a wide range of 
proteins. The ability of these fusion proteins to be 
substantially purified by sucih treatments, which are 
significantly simpler and less expensive than the purification 
5 methods required by other fusion protein systems, may provide the 
fusion proteins of the invention with a significant advantage 
over other systems imidh are used to produce proteins in Bt ^Xi- 

The resulting fusion protein is stable and soluble, often 
with the heterologous peptide or protein retaining its 

10 bioactivity. ThB heterologoxis peptide or protein may optionally 
be separated from the thioredoxin-like protein by cleavage, as 
discussed above. 

In the specific and illustrative endsodiments of the 
eoBvositions and methods of this invention, the Et coli 

15 thioredoxin (trx&) gene has been cloned and placed in an B. goli 
ea^ression system. An esepression plasmid p2^trxA-781 was 
constructed. This plasmid containing modified Hr-ll fused to the 
thioredoxin sequence and called pKLtxyA/SK/xmJ^Vxo-SBl. is 
described below in Bxaaqple 1 and in Fig. 1. A modified version 

20 of this plasmid containing a different ribosome binding site was 
employed in the other examples and is specifically described in 
Example 3. Other conventional vectors may be eaployed in this 
invention. The invention is not limited to the plasmids 
described in these eaomples. 

25 Plasmid pAI.trxA-781 (without the modified IL-ll) directs the 

accumulation of >10% of the total cell protein as thioredoxin in 
E. coli host strain GI724. Examples 2 through 6 describe the use 
of this plasmid to form and express thioredoxin fusion proteins 
with BHP-2, lL-6 and MIP-a« , which are polypeptides. 

30 As an example of the expression of small peptides inserted 

into the active-site loop, a derivative of pALtrxA-781 has been 
constructed in which a 13 amino-acid linker peptide sequence 
containing a cleavage site for the specific protease enterokinase 
[LeipnieJcs and Light, -t. bIqI. chem. 254; 1077-1083 (1979)] has 

35 b en ftised int the active site loop f thioredoxin. This 
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plasmid (pALtrxJl-£K) directs the accumulation of >10% of the 
total cell protein as the fusion protein. The fusion protein is 
all soluble, indicating that it has probably adopted a • native • 
tertiary structure. It is equally as stable as wild type 
5 thioredoxin to prolonged incubations at 80 *C, suggesting that the 
strong tertiary structure of thioredoxin has not been compromised 
by the insertion into the active site loop. The fusion protein 
is specifically cleaved by enterokinase, whereas thioredoxin is 
not, indicating that the peptide inserted dlnto the active site 

10 loop is present on the surface of the fusion protein. 

As described in more detail in Example 5 below, fusions of 
small peptides were made into the active site loop of 
thioredoxin. The inserted peptides were 14 residues long and 
were of totally random composition to test the ability of the 

15 system to deal with hydrophobic, hydrophilic and neutral 
sequences. 

The methods and compositions of this Invention permit the 
production of proteins and peptides useful in research, 
diagnostic and therapeutic fields. The production of fusion 

20 proteins according to this invention has a number of advantages. 
As one example, the production of a selected protein by the 
present invention as a carboxyl-terminal fusion to E. coli 
thioredoxin, or another thioredoxin-like protein, enables 
avoidance of translation initiation problems often encountered in 

25 the production of etGcaryotic proteins in E, coli . Additionally 
the initiator methionine usually remaining on the amino^terminus 
of the heterologous protein is not present and does not have to 
be removed when the heterologous protein is made as a oarboxyl 
terminal thioredoxin fusion. 

30 The production of fusion proteins according to this 

invention reliably improves solubility of desired heterologous 
proteins and enhances their stsU^ility to proteases in the 
expression system. This invention also enables high level 
xpr ssi n f certain desirable th rapeutic prot ins, e.g., IL- 

35 11, which are otherwise produced at low levels in bact rial host 
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cells. 

ThLs Invention may also confer heat stability to the fusion 
protein, especially if the heterologous protein itself is heat 
stable* Because thioredoxin, and presumably all thioredoxin-liXe 
5 proteins, are heat stable up to 80 -C, the present invention may 
enable the use of a simple heat treatment as an initial effective 
ptirification step for some thioredoxin fusion proteins. 

In addition to providing high levels of the selected 
heterologous proteins or peptides upon cleavage from the fusion 

10 protedLn for therapeutic or other uses, the fusion proteins or 
• fusion peptides of the present invention may themselves be useful 
as therapeutics. Further the thioredoxin-like fusion proteins 
may provide a vehicle for the delivery of bioactive peptides. As 
one example, human thioredoxin would not be antigenic in humans, 

15 and therefore a fusion protein of the present invention with 
human thioredoxin may be useful as a vehicle for delivering to 
humans the biologically active peptide to which it is fused. 
Because humzui thioredoxin is an intracellular protein, human 
thioredoxin fusion proteins may be produced in an Bf coJ-i 

20 intracellular ea^ression system. Thas this invention also 

provides a method for delivering biologically active peptides or 
proteins to a patient in the form of a fusion protein with an 
acceptable thioredoxin-like protein. 

The present invention also provides methods and reagents for 

25 screening libraries of random peptides for their potential enzyme 
inhibitory, hormone/growth factor agonist and hormone/growth 
factor antagonist activity. Also provided are methods and 
reagents for the mapping of taiown protein sequences for regions 
of potential interest, including receptor binding sites, 

30 substrate binding sites, phosphorylation/modification sites, 
protesise cleavage sites, and epitopes. 

Bacterial colonies ea^ressing thioredoxin-1 ike/random 
peptide fusion proteins may be screened using radiolabelled 
prot ins such as hormones r growth fact rs as prob s. Positives 

35 arising from this typ of screen w uld identify mimics of 
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receptor binding sites and may lead to the design of compotinds 
vith therapeutic uses* Bacterial colonies expressing 
thioredoxin-like random peptide fusion proteins may also be 
screened using antibodies raised against native, active hormones 
5 or growth factors. Positives arising from this type of screen 
could be mimics of surface epitopes present on the original 
antigen. Where sudh surface epitopes are responsible for 
receptor bdLnding, the "positive* fusion proteins would have 
biological activity. 

10 Additionally, the thioredoxin-like fusion proteins or fusion 

peptides of this invention may also be employed to develop 
monoclonal and polyclonal antibodies, or recombinant antibodies 
or chimeric antibodies, generated by known methods for 
diagnostic, purification or therapeutic use. Studies of 

15 thioredoxin-like molecules indicate a possible B cell/T cell 
growth factor activity [K. Hakasuki et al, cited above] , which 
may enhance immune response. VhB fusion proteins or peptides of 
the present invention may be en^loyed as antigens to elicit 
desir2Q>le antibodies, which themselves may be further manipulated 

20 by known techniques into monoclonal or recombinant antibodies. 
Alternatively, antibodies elicited to thioredoxin-like 
secpiences may also be useful in the purification of many 
different thioredoxin fusion proteins. 

The following examples illustrate embodiments of the present 

25 invention, but are not intended to limit the scope of the 
disclosiire. 

EXAMPI^ 1 - amrOREDOXIN-II ^ll FUSION MOUCOLE 

A thioredoxin-like fusion molecule of the present invention 
30 was prepared using E. coli thioredoxin as the thioredoxin-like 
sequence and recombinant XL-11 as the selected heterologous 
protein. The DNA and amino acid sequence of IL-11 has been 
pxablished. See Paul et al, Proc. Natl. Acad. Sci. U.S.A. 
£2:7512-7516 (1990) and PCT Patent publication W091/0749, 
35 publish d May 30, 1991. XL-ll DMA can b obtain d by cloning 



wo 92/13955 



PCT/US92/00944 



22 

based on its published sequence. The e> coli tOiioredoxin (trxA) 
gene was cloned based on its published sequence and employed to 
construct various related eoli expression plasmids using 
standeaxi DNA manipulation techniques, described extensively by 
5 Sambrook, Fritsch and Maniatis, Molecular cloning, A Iiabogatory 
Manual . 2nd edition. Cold Spring Harbor liaboratory. Cold Spring 
Harbor, NY (1989). 

A first expression plasmid pAM!Rxa-781 was constructed 
containing the E, coli trxA gene without fusion to another 
10 sequence. This plasmid further contained sequences which are 
• described in detail below for the related IL-11 fusion plasmid. 
This first plasmid, which directs the accumulation of >10% of the 
total cell protein as thioredoxin in an fit poli host strain 
6X724, was further manipulated as described below for the 
15 constaniction of a txxA/XJj^Xl fusion sequence. 

The entire sequence of the related plasmid es^ression 
vector, pALtrxA/EK/lI,iaAPro-581, is illustrated in Pig. 1 and 
contains the following principal features: 

Nucleotides 1-2060 contain DNA sequences originating from 
20 the plasmid pUC-18 [Norrander et al. Gene 26? 101-106 (1983)] 
including sequences containing the gene for p -lactamase which 
confers resistance to the antdLbiotic ampicillin in host g> <?Qli 
strains r and a colEl-derived origin of replication. Nucleotides 
2061-2221 contain DNA sequences for the major leftward promoter 
25 (pL) of bacteriophage A [Sanger et al, J, Kol. Bipl, 16;; 729-773 

(1982) ]c including three operator sequences, Oj^l, 0^2 and Oj^3. 
The operators are the binding sites for X cl repressor protein, 
intracellular levels of which control the amount of trzuiscription 
initiation from pL. Nucleotides 2222-2241 contain a strong 

30 ribosome binding sequence derived from that of gene 10 of 
bacteriophage T7 IDuxm and Studier J> mq1« Biol, 166:477-535 

(1983) ]. 

Nucleotides 2242-2568 contain a DNA sequence encoding the 
coli thioredoxin protedLn [Lim et al, J> Bacteriol, 163; 311-316 
35 (1985}]* Ther is no tremslati n termination codon at the end of 



wo 92/13955 



PCr/US92/00944 



23 

tliB thloredoxin coding sequence In this plasmld. 

Nucleot:ldes 2569-2583 contain DNA sequence encoding 1:he 
amino acid sequence for a short, hydrophilic, flexible spacer 
peptide » — 6S6S6->-'*. Nucleotides 2584*2598 provide DNA sequence 
5 encoding the amino acid sequence for the cleavage recognition 
site of enterokinase (SC 3. 4.4. 8), *DDDDK~" [Maroux et al. J. 

Biol, Chen. 246t5031-5Q39 (1971)]. 

Nucleotides 2599-3132 contain DNA sequence encoding the 
anino acid sequence of a modified form of mature human IIr-11 
10 [Paul et al, Proc. Natl> Aead> Sei. USA 87:7512-7516 (1990)], 
' deleted for the N-terminal prolyl-residue normally found in the 
natural protein* The secpaence includes a tremslation termination 
eodon at the 3* -end of the ZL-ll sequence. 

Nucleotides 3133-3159 provide a "Linker" DNA secpaence 
15 containing restriction endonuclease sites. Nucleotides 3160*3232 
provide a transcription termination secpaence based on that of the 
E. coli io^A gene [Takagi et al, Nucl. Acids Res, 13 :2063-2074 
(1985) ]• Nucleotides 3233-*3632 are DNA sequences derived from 
pUC-18. 

20 As described in Example 2 below, when cultured under the 

appropriate conditions in a suitable E. coli host strain, this 
plasmid vector can direct the production of high levels 
(approximately 10% of the total cellular protein) of a 
thioredoxin«lli-ii fusion protein. By contrast, when not fused to 

25 thioredoxin, IL-ll accumulated to only 0.2% of the total cellular 
protein .when expressed in an analogous host/vector system. 

EXAMPIiE 2 - EXPRESSION OP A FUSION PROTEIN 

A thioredoxin-IL-11 fusion protein was produced according to 
30 the following protocol using the plasmid constructed as described 
in Example 1* pALtrxA/EK/ILllA Pro-581 was transformed into the 
E« coli host strain 6X724 (P"^ laciq, lac P^, ampciaci'*') by the 
procedure of Dagert and Ehrlich, Gene 6 ; 23 (1979). The 
Tintransformed host strain E. coli GI724 was deposit d with the 
35 American Type Culttire Collection, 12301 Parklawn Driv , 
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Roctorille, Haryland on January 31, 1991 under ATCC No. 55151 for 
patent: purposes pursuant to explicable laws and regulations. 
•Eransformants were selected on 1.5% w/v agar plates containing 
IMC medium, which is composed of M9 medium [Miller, "Experiments 
5 in Molecular Genetics", Cold Spring Harbor laboratory. Hew York 
(1972)] suppl«nented with 0.5% w/v glucose, 0.2% w/v casamino 
acids and 100 ii g/ml aa^ioillin. 

6X724 contains a copy of the wild-type k cl repressor gene 
stably Integrated into the dhromosome at the fflSEC locus, where it 

10 has been placed under the transcriptional control of p^lmongJ,!^ 
typhlmurium tro promoter/operator sequences. In 6X724, XcX 
protein is made only during growth in tryptophan-free media, such 
as media or a minimal medium supplemented with casamino 

acids sucflx as IMC, described above. Addition of tryptophan to a 

15 culture of 6X724 will repress the trp promoter and turn off 
synthesis of ;LcX, gradueJ.ly causing the induction of 
transcription from pi. promoters if they are present in the cell. 

61724 transformed with pM.traea/BiyilJ-3A PJ»-581 was grown at 
37 "C to an A550 of 0.5 in IMC medium. Tryptophan was added to a 

20 final concentration of 100 |i g/ml and the culture incubated for a 
further 4 hours. During this time thioredoxin-IL-ll fusion 
protein accumulated to approximately 10% of the total cell 
protein. 

All of the fusion protein was found to be in the soluble 
25 cellular fraction, and was purified as follows. Cells were lysed 
in a fren<Sh pressure cell at 20,000 psi in 50 mM HEPES pH 8.0, 1 
mM phenylmethylsulfonyl fluoride. The lysate was clarified by 
centrifugation at 15,000 x g for 30 minutes and the supernatant 
loaded onto a QAE-Toyopearl column. The flow-through factions 
30 were discarded and the fusion protein eluted with 50 mM HEFES pH 
8.0, 100 afM Had. The eluate was adjusted to 2M NaCl and loaded 
onto a colvnnn of phenyl-Toyppesurl. The flow-through fractions 
were again discarded and the fusion protein eluted with 50 mM 
HEPES pH 8.0, 0.5 H NaCl. 
35 Th fusi n protein was then dieayzed against 25 mM HEPES pH 
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8.0 and was >80% pure at this stage • By T1165 bioassay [Paul et 
al, cited above] the purified thloredoxln-XIi-11 protein exhibited 
an activity of 8xlO^U/mg. This value agrees closely on a molar 
basis with the activity of 2xlO%/Big found for COS cell-derived 
5 XI«-11 purified to homogeneity and measured for activity in the 
same assay. One milligrsua of the fusion protein was then cleaved 
at 37 *C for 20 hours with 1000 units of bovine enterokinase 
[Leipnleks and Light, J, Blol> Chem, 254 :1677-'1683 (1979)] in 1 
ml lOmH Tris-Cl (pH 8.0)/10mM CaCl2« XIi-11 was recovered from 

10 the reaction products by passing them over a QAB-Toyopearl column 
in 25 mM HEPES pH 8.0, where homogeneous IIr»ll was found in the 
flow-through fractions. Uncleaved fusion protein, thioredoxin 
and enterokinase remained bound on the column. 

The homogeneous lIr-11 prepared in this manner had a 

15 bioactivity in the T1165 assay of 2.5x10^ U/mg. Its physical and 
chemical propeirtles were determined as follows: 

(1) M^l^TOlflr Weight 

The molecular weight of the IZi-11 was found to be €Lbout 21 
20 KD as measured by 10% SDS-PA6E under reducing conditions (trlclne 
system) in accordance with the methods of Schagger, et al.. Anal 
Bi99^** F > 166 g 368»37a (1987). The Compound ran as a single bsuid. 

(2) Endotoxin Content 

25 The endotoxin content of the IL-11 was found to be less than 

0.1 nanogram per milligram lL-11 in the LAL ( Iiimulus eonebocyte 
lysate, Pyrotel, available from Associates of Cape Cod, Inc., 
Woods Hole, Massachusetts, U.S.A.) assay, conducted in accordance 
with the manufacturer's instructions. 

30 

(3) Isoelectric Point 

The theoretical isoelectric point of IIi-ll is pH 11.70. As 
measured by polyacrylamlde gel Isoelectric focusing using an IjKB 
Ampholine PAGplate with a pH range from 3.5 to 9.5, the IL-Il ran 
35 at greater than 9.5. An exact measurement could not b taken 
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because IL-11 Is too basic a protein for the reliable gels 
available. 

(4) Fluoresc ence Absorption Spectrum 

5 Fluorescence absorption spectrum of the IL-11, as measured 

on a 0.1% acpieous solution in a 1 cm quartz cell shoved an 
emission maximum at 335-*337 nm. 

(5) nv Absorption 

10 UV absorption of the IL-ll on a 0.1% acpieous solution in a 1 

cm quartz cell showed an absorbance maximiun at 278-280 nm. 



(6) Amino Acid Composition 
15 The theoretical amino acid composition for Ili-llf based on 



its azoino acid sequence is as follow: 









pole ^ 




Ala 


20 


11.3 




Asp Acid 


11 


6.22 


20 


Cysteine 
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1.70 




Glu 


3 




Phe 
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0.57 
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14 


7.91 




His 
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Xle 
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1.13 




Lys 
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1.70 




Leu 


41 


23.16 




Met 
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1.13 
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0.57 


30 


Pro 


21 


11.86 




Gin 


7 


3.96 




Arg 


18 


10.17 




Ser 


11 


6.22 




Thr 


9 


5.09 


35 


Val 


5 


2.83 




a?rp 


3 


1.70 




Tyr 


1 


0.57 



A sample of homogenoixs XI>11 was subjected to vapor phase 
40 hydrolysis as follows: 

6 N HCl and 2 N Ph nol reagent wer added to hydrolysis 
vessel in which tubes containing 45 |il f 1:10 dilut d (W/E2O) 
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IL-ll, concentrated to dryness are inserted. Samples were sealed 
under vacuxm and hydrolyzed for 36 hours at llO^C. After the 
hydrolysis, samples were dried emd resuspended in 500 |i 1 Na-S 
seuaple dilution buffer. Amino acid analysis was performed on a 
5 Beokman 7300 automated amino acid analyzer. A cation exchange 
column was used for sepatration of amino acids following post 
column derivatization with ninhydrin. Primary amino acids were 
detected at 570 nm and secondary amino acids were detected at 440 
nm. Eight podLnt calibration curves were constructed for each of 

10 the amino acids. 

Because certain amino acids are typically not recovered, 
results for only 5 amino acids are given below. Since the 
hydrolysis was done without desalting the protein, 100% recovery 
%ras achieved for most of the emino acids. 

15 The relative recovery of each individual amino acid residue 

per molecule of reccmbinant XL-ll was determined by noxmalizing 
GUC ■ 10 (the predicted nuaiber of glutamine and glutamic acid 
residue in XI^ll based on cONA sequence) . The value obtained for 
the recovery of GZOC in picomoles was divided by 10 to obtain the 

20 6LX quotient. Dividing the value obtained for the recovery in 
picomoles of each amino acid by the 6ZJC quotient for that sample 
gives a nuniber that represents the relative recovery of each 
amino acid in the saaqple, normalized to the quantitative recovery 
of 6LX residues. The correlation coefficient comparing the 

25 expected verstxs the average number of residues of each amino acid 
observed is greater than 0.985, indicating that the number of 
residues observed for each amino acid is in good agreement with 
that predicted secpience. 

0 Amino 1 No. of Residues 2 No. of Residues 3 Correlation 
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Acids 


calculated 


Exoected 


coefficient 




1 


Asp 


12.78 


12 






2 


GlU 


10.00 


10 






3 


61y 


12.80 


14 


0.9852 


35 


4 


Arg 


16.10 


18 






5 


Pro 


18.40 


21 





wo 92/13955 



PCT/US92/00944 



28 

(7) Amino Terminus Sequencing 

IIi-ll (buffered in 95% acetonitrlle TFA) was sequenced using 
an ABI 471A protein sequencer (ABX, Inc.) in accordance with the 
manufacturer's instructions* Amino terminus sequencing confirmed 
that the thioredoxin fusion protein produced IL-ll contained the 
correct XIr*ll amino acid sequence and only one amino terminus 
observed. 

(8) Peptide MappjLng 

The IIi-ll was cleaved with Endoproteinase Asp-N (Boehringer 
Mannheim) (1:500 ratio of Asp-N to IL-11) in 10 mM Tris, pH 8, 1 
K urea and 2 mM 4-aminobenzamidine dihydrochloride (PABA) , at 
37^C for 4 hottrs. The saxaple was then run on HPIC on a 04 Vydac 
column using an A buffer of 50 mM KeiHP04, pH 4.3, in dH20, a B 
buffer of 100% isopropanol with a gradient at 1 ml/min from 100%A 
to 25%A and 75%B (changing 1%/minute) . The eluted peptide 
fragments were then sequenced using an ABX 471A protein sequencer 
(ABX, Inc.) in accordance with the manufacturer's instiructions. 
Peptide mapped confirmed the IL-ll produced from the thioredoxin 
fusion protein contained the proper XL-11 lf*terminal and 
terminal sequences. 

(9) So3.itf>j.litY 

XL-11 protein was tested for solubility in the substances 
below with the following results: 

Water very soluble 

Ethyl Alcohol very soluble 

Acetone very soluble 

IM sodium chloride very soluble 

10% sucrose very soluble 

(10) Sugar Comp osition and Proteln/Polvsaccharlde Content in % 
The absence of sugar moieties attached to the polypeptide 

backbone of the IL-11 protein is indicated by its amino acid 
sequenc , which contains none f the typical sugar attachment 
sit s. 
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EXAMPLE 3 * THIOREDOXIW^MIP FUSION MOUSCUIJ! 

Himan macrophage Inf lainmatory protein 2n (MXP-Oft ) vas 
expressed at high levels in E, coll as a thloredoxin fusion . 
protein using an escpresslon vector slmilsu: to pAIitrxA/EIV 
5 ILIO^ Pro-581 described in Example 1 above but modified in the 
following manner to replace the ribosome binding site of 
bacteriophage T7 with that of JLCII. Xn the plasmid of Example 1, 
nucleotides 2222 and 2241 were removed by conventional means. 
Inserted in place of those nucleotides was a secKuence of 

10 nucleotides formed by nucleotides 35566 to 35472 and 38137 to 
38361 from bacteriophage laaabda' as described in Sanger et al 
(1982) cited above. This reference is incorporated by reference 
for the purpose of disclosing this sequence. To eaepress a 
thioredoxln-MIP-a« fusion the DNA sequence in the thusly-modlf led 

15 pALtrxA/EK/ILiaAPro-581 encoding human lL-11 (nucleotides 2599- 
3132) is replaced by the 213 nucleotide DNA secpience shown in 
Fig. 2 encoding full-length, mature human KXP-3s [Nalcao et al, 
Mol, cell, Biol, log3fiAg^>a^sa (1990)]. 

The host strain and expression protocol used for the 

20 production of thioredoxin-MlP-aa fusion protein are as described 
in Example 1. As was seen with the thioredoxln-IIi-11 fusion 
protein, all of the thioredoxin-MXP^-Oa fusion protein was found 
in the soluble cellular fraction, representing up to 20% of the 
total protein. 

25 Cells were lysed as in Exeunple 1 to give a protein 

concentration in the crude lysate of 10 mg/ml. This lysate was 
then heated at 80 'C for 10 min to precipitate the majority of 
contaminating E. coll proteins and was clarified by 
centrifugation at 130,000 x g for 60 minutes. The pellet was 

30 discarded and the supernatant loaded onto a Mono Q column. The 
fusion protein eluted at approximately 0.5 M NaCl from this 
column ana was >80% pure at this stage. After dialysis to remove 
salt the fusion protein could be cleaved by an enterokinase 
treatment as described in Exampl 1 to r 1 as HIP- is • 
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EXAMPIiE 4 - TOTOREDOXTU-BMP->2 F TTSTQK MQLECUIJE! 

Human Bone Morpnogenetic Protein 2 (BMP-2) was expressed at 
liigh levels in eoli as a thioredoxin fusion protein using the 
modified ea^ression vector descrilDed in Example 3. The DNA 
5 sequence encoding human IIi-ll in the modified 

pALtrxA/EK/IIillA Pro-581 (nucleotides 2599-3132) is replaced by 
the 345 nucleotide DNA sequence shown in Pig. 3 encoding f\ill- 
length, mature humem BMP-2 [Wozney et al, science 242:1528-1534 
(1988)]. 

10 In this case the thioredoxin-BMP-2 fusion protein appeared 

• in the insoluble cellular fraction when strain GI724 containing 
the expression vector was grown in medium containing tryptophan 
at 37*C, However^ when the temperature of the growth medium was 
lowered to 20 •C the fusion protein was found in the soluble 

15 cellular fraction. 

EXAMPI^E 5 - THTOREDOXiy-SttAIJ. PEPTTDE pn gTOH MOIiECDLES 

Native coli thioredoxin was e3cpressed at high levels in 
E, coli using strain 6X724 containing the same plasmid ea^ression 

20 vector described in Example 3 deleted for nucleotides 2569-3129, 
and employing the growth and induction protocol outlined in 
Example 1. Under these conditions thioredoxin accumulated to 
approximately 10% of the total protein, all of it in the soluble 
cellular fraction. 

25 Fig. 4 illustrates insertion of 13 amino acid residues 

encoding an enterokinase cleavage site into the active site loop 
of thioredoxin, between residues G34 and P35 of the thioredoxin 
protedLn sequence. The fusion protein containing this internal 
enterolcinase site was expressed at levels ec[uivalent to native 

30 thioredoxin, and was cleaved with an enterokinase treatment as 
outlined in Example 1 above. The fusion protein was found to be 
as stable as native thioredoxin to heat treatments, being 
resistant to a 10 minute dLncubation at 80 as described in 
Example 4. 

35 Below ar listed twelve additional peptide insertions which 
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were also made ln1:o the active site loop of thioredoxln between 
634 and P35» The sequences are each 14 emino acid residues In 
length and are random In composition. Each of the thioredoxln 
fusion proteins containing these remdom insertions were made at 
5 levels compareUsle to native thioredoxln. All of them were found 
In the soluble cellular fraction. These peptides include the 
following sequences: 

Pro-Leu-Gln-Arg-Xle-Pro-Pro-Gln-Ala-Leu-Arg-Val-Glu-Gly, 
Pro-»Arg*Asp«-Cys-Val-Gln-Arg-61y-Iys-Ser-Zieu-86r-Zieu-Gly, 

10 Pro-Met-*Arg-Hls«-Asp«»Val-*Arg-Cys-Val-L6u-Hls-Gly-Thr-Gly, 
• Pro-Gly-Val-Arg-Leu-Pro-Ile-Cy s-Tyr-Asp-Asp-lle-Arg-Gly , 
Pro«Ly8-me-Ser-A8p-Gly-Ala-Gln-Gly«-Iieu-Gly--Ala-Val-Gly, 
Pro*Pro-Ser-Zieu«-Val-61n-Asp-Asp-Ser-Phe-*Glu-A8p-Arg-*Gly, 
Pro-Trp-Il e-Asn-Gly-Ala-Thr-Pro-Val -Ly s-Ser-S er-S er-Gly , 

15 Pro-Ala-Hls-Arg-Phe-Arg-Gly-Gly-Ser-Pro-Ala-Ile-Phe-Gly, 
Pro-Ile-Met-Gly-Ala-Ser-Hls-Gly-Glu-Arg-Gly-Pro-Glu-Gly, 
Pro-A8p-Ser*I.eu-Arg-Arg-Arg-Glu--Gly-Phe-Gly-Z.eu-Zieu-Gly, 
Pro-Ser-Glu-Tyr-Fro«-61y-Leu-Ala-Thr->Gly-Hi8->Hl8-Val-Gly, 
and Pro*-Leu-Gly-Val-Leu*Gly*Ser-Ile-Trp*Zieu**Glu-A]:g-Gln-61y . 

20 The Inserted sequences contained examples that were both 

hydrophobic and hydrophllic, and examples that contained cysteine 
residues. Zt appears that the active-site loop of thioredoxln 
can tolerate a wide variety of peptide insertions resulting in 
soluble fxision proteins, standard procedures can be used to 

25 piirify these loop ^inserts**. 

EXAMPU; 6 ^ H mCAN INTERLEUKIK-6 

Human lnterleu]cin-6 (I]>6} was expressed at high levels in 
Bt PQlrj. as a thioredoxln fusion protein using em egression 
vector similar to modified pAIitrxA/BiyiI*13APro-581 described in 
Example 3 above. To express a thloredoxln-IIi-6 fusion the DMA 
sequence in modified pALtrxA/EK/ILllAPro-581 encoding human- IL-ll 
(nucleotides 2599-3132) is replaced by the 561 nucleotide DNA 
sequenc shown in Flgur 6 encoding full->length, mature human XL* 
6 [Hiran et al. Nature 324 ;73->76 (1986)]. Th host strain and 
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35 
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expression protocol used for the production of thioredoxin-IL-e 
fusion protein are as described in Example l. 

Khen the fusion protein was synthesized at 37»C, 
approximately 50% of it was found in the "inclusion body" or 
5 insoluble fraction. However all of the thioredoxin-IL-e fusion 
protein, representing up to 10% of the total cellular protein, 
was found in the soluble fraction when the temperature of 
synthesis was loweired to 25 *C. 

10 EX&MPEB 7 - HOMay MICROPHAGS COLONY STIMtTIATING FACTOR 

Humam Macrophage Colony Stimulating Factor (M-CSF) was 
eaqpressed at high levels in E- coli as a thioredoxin fusion 
protein using the modified expression vector similar to 
pAIitrxVEVXUl^ Pro-581 described in Example 3 above. 

15 Xhe DKA sequence encoding huioan XZi-11 in modified 

pALtrxA/EVILllA Pro-581 (nucleotides 2599-3135) is replaced by 
the 669 nucleotide DNA sequence shown in Fig. 7 encoding the 
first 223 amino acids of mature human M-CSEp [G. G» Wong et al. 
Science 235 ; 1504-1508 (1987)]. The host strain and esqpression 

20 protocol used for the production of thioredoxin-M-CSF fusion 
protein was as described in Example 2 above. 

As was seen with the thioredoxin-IL-11 fusion protein, all 
of the thioredoxin-M-CSF fusion protein was foimd in the soluble 
cellular fraction, representing up to 10% of the total protein. 

25 

EXAMPLE 8 - REL E ASE OF FUSION PROTEIN VIA SHOC^ OI^ 

To determine whether the fusions of heterologous proteins to 
thioredoxin according to this invention enable targeting to the 
30 host cell's adhesion sites and permit the release of the fusion 
proteixis frcua the cell, the cells were exposed to simple osmotic 
shock and freeze/thaw procedures. 

Cells overproducing wild-type E> coll thioredoxin, human 
thi redoxin, th E, coli thior doxin-MIPOs fusion or the E, coJLi 
35 thi r doxin-IL-11 fusion were us d in the following procedur s. 
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For an 08mo1:lc shook teeatment, cells were resuspended at 2 
Ag^o/iol in 20 HiH Tris-Cl pH 8.0/2.5 AH EDTA/20% w/v sucrose and 
kept cold on ice for 10 minutes. The cells were then pelleted by 
centrlfugation (12,000 xg, 30 seconds) and gently resuspended in 
5 the same buffer as above but with sucrose omitted. After an 
additional 10 minute period on ice, to allow for the osmotic 
release of proteins, cells were re-'pelleted by centrlfugation 
(12,000 xg, 2 xd.nutes) and the supernatant ("*shockate'*) ex2UBilned 
for its protein content. Wlld*-type E, coll thloredoxln and human 
10 thloredoxln were gucmtitatively released, giving "shockate" 
preparations which were >80% pure thloredoxln. More 
significantly >80% of the thioredoxln-MIPJls and >50% of the 
thioredoxln-*lli<-ll fusion proteins were released by this osmotic 
treatment. 

15 A simple freeze/thaw procedure produced similar results, 

releasing thloredoxln fusion proteins selectively, while leaving 
most of the other cellular proteins inside the cell. A typical 
freeze/thaw procedure entails resuspending cells at 2 H^^q/joI in 
20 AM Tris-Cl pH 8 .0/2 .5 mM EDTA and quickly freezing the 

20 suspension in dry ice or liquid nitrogen. The frozen suspension 
is then allowed to slowly thaw before spinning out the cells 
(12,000 xg, 2 minutes) and examining the supematemt for protein. 

Although the resultant "shockate" may require additional 
purification, the initial "shockate" is characterized by the 

25 absence of nucleic acid contamlnemts . Compared to an initial 

lysate,.the purity of the "shockate" is significantly better, and 
does not require the difficult removal of DNA from bacterial 
lysates. 

Thus, this release step can be substituted for the lysis 
30 step of Example 2. The supematemt obtained after centrlfugation 
is then further purified In the manner disclosed in that Exeunple. 

Numerous modifications and variations of the present 
invention are Included in the above-identified specification and 
are expect d to b obvl us to n of skill in th art. Such 
35 modifications and alterations to the compositions and processes 
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of the preseni: invention are believed to be encompassed in the 
scope of the claims appended hereto. 
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WHAT IS CIAXMED XS: 

!• A DNA sequence encoding a fusion protein, said sequence 
comprising DNA encoding a thioredoxln-like protein fused to a DNA 
sequence encoding a selected heterologous protein. 

2. A DNA sequence of claim 1 wherein said DNA encoding said 
thioredoxln-like protein comprises the amino terminus of said 
fusion protein. 

3. A DNA sequence of claim 1 wherein said DNA encoding said 
thioredoxin-llke protein coaoprises the carboxyl terminus of said 
fusion protein. 

4. A DNA secpience of claim 1, 2 or 3 wherein said DNA encoding 
said thioredoxin«-like protein is selected from the group 
consisting of E. coll thioredoxin and humem thioredoxin. 

5. A DNA sequence of claim 1, 2 or 3 wherein said DNA encoding 
said selected protein is selected from the group consisting of 
lli'llr XI>6, Macrophage Ixihibltory Protein la and Bone 
Morphogenic Protein 2. 

6. A DNA sequence of claim 1, 2 or 3 additionally comprising a 
linker DNA sequence fused between said DNA encoding said 
thioredoxln-llke protein amd said DNA encoding said selected 
heterologous protein. 

7. A plasmid DNA molecule comprising a DNA sequence of claims 
1-6, said sequence being tuider the control of a suitable 
expression control sequence capable of directing the eicpresslon 
of a fusion protein in a selected host cell. 

8. An E> eoli host cell transformed with, or having integrated 
into the genome thereof, a plasmid of claim 7. 
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9. A method of making a selected heterologous protein 
cosrpi^islng 

(a) culturing in a culture medium under suitable conditions 
a host cell of claim 8; 

(b) recovering the fusion protein produced thereby from 
said culture medium? 

(c) cleaving said selected heterologous protein from said 
fusion protein and 

(d) isolating said selected heterologous protein. 

10, II«-11 protein produced by the method of claim 9. 



11. Use of thioredoxin in the method of claim 9. 
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FIGURE 1A 



pAIitrxVEVIIJL^A. Pro-581 

GAOGAAAGGG CCTO6TG&2EA OGCCXiaTXTT TMMSGTTML 40 

TGTCATGATO ASAASGGnTT GXZaG2^0GTC AGCTG6GACT 80 

TTTOGGGGAA A!1H3X60(3GG6 AAOOOCZASX TOTXXACTTT 120 

TCXaAASAGA !nXSUykSZ!ATO XaXCOGCTCA TCAG&G&AIXA 160 

ACCCXGAIOraA A!TCCT!rC3^^ AAlXiATTG^^ AJkGGUVAGAGT 200 

AaffGAGXAIirTC AUGAITTTCOG TGTCGOCCITT AT3!CCCTTTT 240 

OTQCGGCATT TTGCCTTCCT GTTTTTGCTC ACCCA6AAAC 280 

GCT66T6A2UI GTAAAAGAITC CT6A&GATGA GTTGGGONSGA 320 

OGAGTCGGTT ACAX06JVACT GG»!rCTGAAC AGGGGSAAGA 360 

1*CCTZGAGAG STTT06000C GAASAAOGT3« TTCCAAteAIF 400 

GAGGACTCTT AAA6TTCSGC VASX3TOG06C GG3!ATTATCC 440 

0GTATXGA06 006GGGAAGA 6GAACTOGGT 0G006GATAC 480 

ACTATTCTGA GAA!F6ACTTG GTT6ACTACT GAOGAGTCAC 520 

AGAAAAGGAT CTXAC6GATO GGATGACA6T AAGAGAASDXA 560 

TGCAGTGCTG CCA!Z!AACCA!r GAGTGAHAAC ACTGCGGCCA 600 

ACTTACTTCT GACAACGATC GGAGGAGOGA AGGAGCTAAC 640 

C6Ca?TTTTTG CACAAGATOG 6GGATGATGT AACTCGCCTT 680 

GATCGTT6GG AACCGGAGCT Gl^AITGAAGCC ATACClkAACG 720 

AC6AGGGTGA CACCACGATG CCTGTAGGAA TGGCAACAAC 760 

GTTGCGCAAA CTATTiAACTG GCGAACTACT TACTCTAGCT 800 

TCCCGGCAAC AAXTAAXAGA CTGGATGGAG GC6GATAAAG 840 

TTGCAGGACC ACTTCTGCGC TCGGCCCTTC CGGCTGGCTG 880 

GTTTATTGCT GATAAATCTC GAGCCGGTGA GC6TGGGTCT 920 

CGCGGTATCA TTGCAGCACT GGGGCCAGAT GGTAA6CCCT 960 

CCCGTATCGT AGTTATCTAC AC6ACGGGGA GTGAGGCAAC 1000 

TATGGATGAA CGAAATAGAC AGATCGCTGA GATAGGTGCC 104 O 

SUBSTITUTE SHEET 
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FIGURE IB 



TCACTGATXA ACTOTCA6AC CAAGTTZAGT 1080 

CAZ&TATACT T1AGATT6AT TTAA^CTTC AZTTTTCAATT 1120 

SAAAA6GATC TCCTTTTTGA «EAATCTC&!X6 1160 

CrZAACGTGA 6CTTTC6TTC GACTCAGG6T 1200 

GAGA0CC06T AfiAAAAGATC AAAGG2^CTT CTTGA6ATCC 1240 

TTTTTTTCTO OGOGlSkATCT 6CT6CTT6CA AACAAAAAAA 1280 

CGACG6GX!AC CA60GGTOCT !C3X3TTT6CC6 GATGAAGA6C 1320 

TACCAACTCS TTTTCGSGAA6 GXAACT66CT OTGAGGAGAGC 1360 

6GAGAXACGA AAOACTGTCC TTCIIA6T6TA GOCGOSAGTXA 1400 

66CCACGACT TCAAGAACTC TOXA6GACCG CCTACATACC 1440 

TCGCTCT6CT A21TCCTCTTA CCA6TGGCT6 CT6CCA6TCG 1480 

C6A!irAAGTC6 TGTCTTACOG GGTTGGACTC AAGACGAXA6 1520 

TTACOGGATA AGGCGCAGCG 6TCGGGCT6A AGGGGGGGTT 1560 

CGTGGACACA 6CCCAGCTTG 6AGCGAA06A CCTACACCGA 1600 

ACXGAGAXAC CTACAGC6TO AGGATT6AGA AAGCGGCAOG 1640 

CTTCOOGAAG GGAGAAAGGC GGAGAGGXAT OCGGXAAGOG 1680 

GGAGGGTCGG AAGAGGAGAG CGCAOGAGGG A6CTTCCAGG 1720 

GGGAAAC6CC TG6TATCTOT ATAGTCCTGT C6GGTTTCGC 1760 

CACCTCTGAC TTGAGCGTCG ATTTTTGTGA TCCTC6TCA6 1800 

GGGGGGGGAG CCTATGGAAA AACGCCAGCA ACGCGGCCTT 1840 

TTTAOGGTTC CTGGCCTTTT GCTGGCCTTT TGCTCACATG 1880 

TTCTTTCCTG GGTTATCCCC TGATTCTGTG GATAACCGTA 1920 

TTACCGCCTT TGAGTGAGCT GATACCGCTC 6CCGGAGCOG I960 

AACGACCGAG CGGAG06AGT GAGTGAGCGA GGAAGC6GAA 2000 

GAGCGCCCAA TACGCAAACC GCCTCTCCCC GCGCGTTGGC 2040 

C6ATTCATTA ATGCAGAAOrr GATCTCTCAC CTACCAAACA 2080 

ATGCCCCCCT GCAAAAAATA AATTCATATA AAA2UICATAC 2120 

SUBSTITUTE SHEET 



wo 92/13955 



PCr/US92/0a944 



3/12 

FIGURE 1C 



AGATAACGAT CT60G6TCAIF AAAXTATCTC T6GG66TCTT 2160 

GAGATA2L&XA CGACT6G06G TGAZACTCA6 GAGATCAGCA 2200 

GGACGCACTC ACCACCA3X3A AITTCAAGAA6 GA6ATAXAGA 2240 

T ATC AGC GAT AAA ATT ATT GAC CT6 ACT GAC GAC 2274 
Met: 8er Asp liys Xle Xle His Zisu Thr Asp Asp 
15 10 

A6T TTT GAC ACQ GAT 6TA CTC AAA GC6 GAC GGG 2307 
Ser Stie Asp Thr Asp Val Iisu Xys Ala Asp 61y 
15 20 

6CG ATC CTC 6TC GAT TTC TiSG GCA GAG TGG TGC 2340 
Ala Xle Leu Val Asp Phe Trp Ala Glu Trp Cys 
25 30 

GGT COG TGC AAA ATG ATC 6CC CCG ATT CTG GAT 2373 
Gly Fro Cys Iiys Met Xle Ala Pro Xle lieu Ae^ 
35 40 

GAA ATC GCT GAC GAA TAT CAG GGC AAA CTG ACC 2406 
OlM Xle Ala Asp Glu Tyr 61n Gly l.ys lieu Thr 
45 50 55 



GTT GCA AAA CTG AAC ATC GAT CAA AAC CCT GGC 
Val Ala lys Leu Asn Xle A^ Gin Asn Pro Gly 

60 65 

ACT GCG CCG AAA TAT GGC ATC CGT GGT ATC CCG 
Thr Ala Pro Lys Tyr Gly Xle Arg Gly Xle Pro 
70 75 

ACT CTG CTG CTG TTC AAA AAC GGT GAA GTG GCG 
Thr Leu Leu Leu Phe Lys Asn Gly Glu Val Ala 
80 85 

GCA ACC AAA GTG GGT GCA CTG TCT AAA GGT CAG 
Ala Thr Lys Val Gly Ala Leu Ser Lys Gly Gin 
90 95 

TTG AAA GAG TTC CTC GAC GCT AAC CTG GCC GGT 
Leu Lys Glu Phe Leu Asp Ala Asn Leu Ala Gly 
100 105 110 



2439 



2472 



2505 



2538 



2571 



TCT GGT TCT GGT GAT GAC GAT GAC AAA GGT CCA 
Ser Gly Ser Gly Asp Asp Asp Asp Lys Gly Pro 

115 120 



2604 
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FIGURE ID 



CCA CCA GGT CCA CCT OGA GTT TCC CCA GAC CCT 2637 
Pro Pro Gly Pro Pro Arg Val Ser Pro Asp Pro 
125 130 

C66 6CC GAG CTG GAG AGC ACC GTG CTC CZG ACC 2670 
Arg Ala Glu Leu A^p Ser Xbr Val Leu Leu Xhr 

135 140 

CGC TOT CTC CTG GOG GAC ACG 06G CAG CTG GCT 2703 
Arg Ser Leu Leu Ala Ai^ Thr Arg Gin Leu Ala 
145 ISO 

GCA CAG CTG AGG GAC AAA TTC CCA GCT GAC GGG - 2736 

Ala Gin Leu Arg Asp Lys Pbe Pro Ala Asp Gly 
155 160 165 

SAC CAC AAC CTG GAT TCC CTG CCC ACC CTG 6CC 2769 
Asp His Asn Leu Asp Ser Leu Pro Thr Leu Ala 

170 175 



ATG AGT GCG GGG GCA CTG GGA GCT CTA CAG CTC 

Hel: Ser Ala Gly Ala Leu Gly Ala Leu Gin Leu 
180 185 



CGC CTG 6CC CTG CCC CAG CCA CCC CCG GAC CCG 
Arg Leu Ala Leu Pro Gin Pro Pro Pro Asp Pro 
245 250 



2802 



CCA GGT GTG CTG ACA AGG CTG CGA GCG GAC CTA 2835 
Pro Gly Val Leu Tbr Arg Leu Arg Ala Asp Leu 
190 195 

CTG TCC TAC CTG OGG CAC GTG CAG TGG CTG CGC 2868 
Leu Ser Tyr Leu Arg His Val Gin Trp Leu Arg 
200 205 

OGG GCA GGT GGC TOT TCC CTG AAG ACC CTG GAG 2901 
Arg Ala Gly Gly Ser Ser Leu Lys Thr Leu Glu 
210 215 220 

CCC GAG CTG GGC ACC CTG CAG GCC CGA CTG GAC 2934 
Pro Glu Leu Gly Thr Leu Gin Ala Arg Leu Asp 

225 230 

CGG CTG CTG CGC CGG CTG CAG CTC CTG ATG TCC 2967 
Airg Leu Leu Arg Arg Leu Gin Leu Leu . Met Ser 
235 240 



3000 



CCG GCG CCC CCG CTG GCG CCC CCC TCC TCA GCC 3033 
Pro Ala Pro Pro Leu Ala Pro Pro Ser Ser Ala 
255 260 
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FIGURE IE 

•TGG GG6 66C ATC AGG 6CC GCC CAC 6CC ATC CT6 3066 
Xrp Gly Gly Xle Arg Ala Ala His Ala Zle lieu 
265 270 275 

G6G G6G CTG CAC CTG ACA CTT GAC TGG GCC GTG 3099 
Gly Gly Leu His Zieu Xhr Leu Asp Szp Ala Val 

280 285 

AGG G6A CXG CTG CTG CTG AA6 ACT CGG CTG TGA 3132 
Arg Gly Leu Leu Leu Leu Lys Tbr Arg Leu 
290 295 

AA6CTZATOG ATACC6TCGA CCTGCAGTAA TG6TACAGGG 3172 

TA6TACAAAT AAAAAAGGCA CGTCA(3ATGA C6T6CCTTTT 3212 

TTCTTGTGAG CAGTAAGCTT GGCACTGGCC 6TCGTTTTAC 3252 

AACGTCCTGA CTGGGAAAAC CCTGGC6TTA OCC^CTTAA 3292 

TCGCCTTGCA GCACATCCCC CTTTC60CAG CTGGC6TAAT 3332 

A6CGAAGAGG CCCGCACCGA TC6CCCTTCC CAACAGTT6C 3372 

6CAGCCTGAA TGGOGAATGG CGCCTCAT6C 6GTATTTTCT 3412 

CCTTAG6CAT CTGTGCGGTA TTTCAG&CCG CATATATGGT 3452 

6CACTCTCAG TACAATCTGC TCTGATGCCG CATA6TTAAG 3492 

CCA6CCCCGA CACCC6CCAA CACCC6CTGA CGGGCCCTGA 3532 

CGGGCTTGTC TGCTCCCGGC ATCCGCXTAC AGAGAAGCTG 3572 

TGACCGTCTC CGGGAGCTGC AT6TCTCAGA GGTTTTCACC 3612 

GTCATCACCG AAAC6CGCGA 3632 
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FX6DRE 2 

6GA CCA CTT GCT GCT GUC A06 CC6 ACC 6CC T6C T6C 36 
Ala Pro Leu Ala Ala Asp Thr Pro Thr Ala Cys Cys 
15 10 



TTC A6C 1EAC ACC TCC CGA CA6 AST CCA GAG AAT TTC 72 
Phe Ser Tyr Xhr Ser Arg 61n lie Pro Gin Asn Phe 
15 20 

AHA GCT GAC TAC TTT GAG ACG AGC AGC C&6 TGC TCC 109 
He Ala Asp Tyr Rie Glu Thr Ser Ser Gin Cys Ser 
25 30 35 

AAG CCC AGT GTC ATC TTC CTA ACC AAG AGA G6C C6G • 145 
Iiys Pro Ser Val He Phe Leu Thr Lys Arg Gly Arg 
40 45 

GAG GTC TGT GCT GAC CCC AGT GAG GAG TGG GTC CHG 181 
Gin Val Cys Ala Asp Pro Ser Glu Glu Trp Val Gin 
50 55 60 

AAA TAC GTC AGT GAC CTG GAG CT6 AGT GCC TAA 214 
Xys Thr Val Ser Ai^ Leu Glu Leu Ser Ala 

65 70 
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FIGURE 3 

BHP-2 

CAA 6CT AAA CAT AAA GAA C6T AAA OCT CT6 AAA TCT 36 
Gin Ala Ziys His Lys Gin Axg Lys Arg lieu Iiys Ser 
1 5 10 

AGC TGT AAG AGA CAC CCT TTG TAG GTG 6AC TTC AST 72 
Ser Cys Lys Arg His Pro Leu Tyr Val Asp Phe Ser 
15 20 

GAG GTG GGG TGG AAT GAG TGG ATT GTG GGT GGG GCG 109 
Asp Val Gly Trp Asn Asp Txp lie Val Ala Pro Pro 
25 . 30 35 

GGG TAT GAG GGG TTT TAG TGG GAG GGA GAA TGG GGT 145 
Gly Tyr His Ala She Tyr Gys His Gly Glu pys Pro 
40 45 

TTT GGT GTG GGT GAT GAT GTG AAG TGG AGT AAT GAT 181 
She Pro Leu Ala Aap His Leu Asn Ser Thr Asn His 
50 55 60 

GGG ATT GTT GAG AGG TTG GTG AAC TGT GTT AAG TGT 217 
Ala Zle Val Gin Thr Leu Val Asn Ser Val Asn Ser 

65 70 

AAG ATT GGT AAG GGA TGG TGT GTC GGG ACA GAA GTG 253 
Lys Xle Pro I^fs Ala pys Gys Val Pro Thr Glu Leu 
75 80 

AGT GGT ATG TOG ATG GTG TAG GTT GAG GAG AAT GAA 289 
Ser Ala lie Ser Mel: Leu Tyr Leu Asp Glu Asn Glu 

85 90 95 

AAG GTT GTA TTA AAG AAG TAT GAG GAG ATG GTT GTG 325 
Lys Val Val Leu Lys Asn Tyr Gin Asp Hel: Val Val 
100 105 

GAG GGT TGT GGG TGT GGG TAG 346 
Glu Gly Gys Gly Cys Arg 
110 
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FI6DSE 4 

INSERTION OF AN ENTEROKINASE SITE INTO 

THE ACTIVE-SITB LOOP OF E.COU: THIOSEDOXIH (trxA) 



RsrII 
I 

... .GaGRrGGTGCGGTCOGTGGAAA&TG. 

site loop . . . .CTCACCAOGCCAGGCAOGTTTTaC. 

....EWC6PCKM . 
33. 38 



....GA6TGGTGC6 GTCOGT6CAAAATG. . . . 

....CTCACCA060CA6 GGAC6TTITAC. . . . 

• • • •£ W C 6 PC K M • • • • 

3X 38 



£n1:ero]dJiase site 
(X3 residues) 

gtcaotCGsGACX&CAAAGAGGACGAOG&C&AAgcl:^ 

tgaggCT6A.T6TTTCT6CT6CT6C3HaTTTcga 

• •••H S D Y K D D D D KA S 6... 

A. 

cleavage site 
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FIGDUE 5 

RANDOM PEFTIOE INSERTIONS INTO THE ACTIVE-SITE 
LOOP OF E.COLI THIOREDOXIN (tzracA) 



RsrII 
I 

, . . .G&6TG6T6CG6TCC6T6CAAAAT6. 

'trxA acbive '■- ■ ' i ^ ^— — — — — — — 

site loop . . • .CTC3UXAC9GOCAGGCaOGTTTTAC. 

....B W. C 6 P C K M . 
31 38 



.... GA6T66TG06 6TCC6T6CAAA&T6 . 

StsrII cut 

....CTCACCAOGCCftG ■ 6CAC6TTTTAC. 



....EHC6 PCXM 
31 38 



(Avail) Avail 
5» I I 3 

GACTGACTGCXOOG. . . (N35) . . .GGTCCTCAGTCAGTCAG 

CCAGCAGTCAGTCAGTC 
3« 5« 



vaaAom GTC06. • . (N3g) • • .6 

6C...(N3g)...CGAG 

Insairtlcn into tracA active site loop 

. . . .GA6TGCT6C66TCC6. . . (N35} . . .GGTCCGTGCAAAATG. 

. . . .CTGACCA06CCA66C. . . (V^^) • • .CCAGGCAC6TTTTAC. 

....E W C 6 P. .(Xio)* . G P C K M . 
31 38 
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FIGDItE 6 

Z2>6 

5 10 
AI6 GCT CCA 6TA CCT OCA GGT GAA GAT TCT AAA GAT 6TA 39 
JSBb Ala Pro Val Pro Pro 6ly Glu Asp Ser Xys Asp Val 

15 20 25 

GCC GCC CCA GAG A6A GAG CCA CTC ACC TCT TCA GAA C6A 78 
Ala Ala Pro His Arg Gin Pro lieu Thr Ser Ser Glu Arg 

30 35 
ATT GAG AAA CAA ATT GGG TAG ATG CTG GAG GGG ATG TCA 117 
lie Asp Lys Gin lie Arg Tyr He Iiou Asp Gly He Ser 

40 45 50 

GGG GTG AGA AAG GAG AGA TGT AAC AAG AGT AAG ATG TGT 156 
Ala Leu Arg Lys Glu Thr cys Asn Lys Ser Asn Mel: Gys 

55 60 
GAA AGG AGG AAA GAG GGA GTG GCA GAA AAG AAG GTG AAG 195 
Glu Ser Ser Lys Glu Ala Leu Ala Glu Asn Asn Leu Asn 

65 70 75 

CTT GGA AAG ATG GGT GAA AAA GAT GGA TGG TTG CAA TCT 234 
Leu Pro Lys K&t Ala Glu Lys Asp Gly Cys Plie Gin Ser 

80 85 90 

GGA TTG AAT GAG GAG ACT TGG CTG GIG AAA ATG ATG AGT 273 
Gly Phe Asn Glu Glu Thr cys Leu Val Lys Xle Zle Thr 

95 100 
GGT CTT TTG GAG TTT GAG GTA TAG GTA GAG TAG GTG GAG 312 
Gly Leu Leu Glu Phe Glu Val Tyr Leu Glu Thr Leu Gin 

105 110 115 

AAG AGA TTT GAG AGT AGT GAG GAA GAA GGG AGA GGT GTG 351 

Asn Arg Plxe Glu Ser Ser Glu Glu Gin Ala Arg Ala Val 



120 125 
GAG ATG AGT AGA AAA GTC GTG ATG GAG TTG GTG GAG AAA 
Gin Kst Ser Thr Lys Val Leu He Gin Phe Leu Gin Lys 



390 
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FIGURE 6 (c nt:lnued) 

130 140 150 

HJ^G 6GA AA6 AAT CSA GAT 6CA ASA ACC ACC CCT GAC CCA 429 
Lys Ala Iiys Asn Ij&a Asp Ala Xle Thr Ollhr Pro Asp Fro 

155 160 
ACC ACA AAT 6CC A6C CT6 CT6 AC6 AA6 CT6 CA6 6CA CAG 468 
Tbr Thr Asn Ala Ser lieu lieu Xhr Lys Leu Gin Ala Gin 

170 175 180 

AAC CAG TGG CTG CAG GAC ATG ACA ACT GAT CTC ATT CTG 507 
Asn Gin Trp Leu Gin Asp Mel: Thr Thr His Leu lie Leu 



185 190 
06C AGC TTT AAG GAG TTC CTG CAG TCC AGC CTG AGG GCT 546 

Arg Ser She Lys Glu Phe Leu pin Ser Ser Leu Arg Ala 



195 

CTT CGG CAA ATG TAG 
Leu Arg Gin Mel: * 



561 
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FIGURE 7 



X 


GAAGaaGTTT 


CTGAAXAST6 


TAGCCACAIT6 


ATZGGGACTG 


GACACCTGCA 


51 


GTCTCTCGA6 


GGGGTGATT6 


AGAGTCAGAIT 


GGAGACCT06 


TCCCAAATTA 


101 


CaTTTG&fiTT 


nSEAGACCAG 


GAAGAGTTGA AAGATOCAGT 


6TGCTACCTT 


151 


AAGAAGGCaT 


TTCTCCT6GT 


ACAAGAGATA 


ATCGAGGAGA 


CCAT60GCTT 


201 


GA6AGATAAC 


ACCCCCAAT6 


CCATCGCCAT 


T6T6CA6GT6 


GAGGAACTCT 


251 


CTTTGA6GCT 


GAA6A6CTGC 


TTCACCAA6G 


ATZATGAAGA 


GGATGAGAA6 


301 


GCCT60CT0C 


GAACTTTCIA 


T6AGACAOCT 


CTCGA6TTGC 


TGGAGAAGG!F 


351 


G3kAG2U^I3!GTO 


GA&CTCGAAC 


GAAAGAATCT 


CCTX6AGAA6 
CTGAAT6CTC 


6ACTGGAATA 


401 




AAGA6CTTT6 


GA6CGAAGAT 


451 


6TGGTC2kCGA A6CCTCATT6 


CAACTGCCT6 


ZACCCCAAAG 


CCATCCCTA6 


501 


C&GTGAC006 


GCCTCTGTCT 


CCCCTCATCA 


GCCCCTCGCC 


CCCTCCATGG 


551 


CCCCTGTGGC 


TGGCTTGACC 


TGGGAGGACT 


CXOAGGGAAC 


T6A6G6GAGC 


601 


TCCCTCTT6C 


CTG6TGA6CA 


6CCCCT6GAC ACAGTGGMX 


CAG6GAGT6C 


651 


GAA6GA6C66 


CGACCCAG6 
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